Overview

Dataset statistics

Number of variables19
Number of observations494778
Missing cells1147937
Missing cells (%)12.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory71.7 MiB
Average record size in memory152.0 B

Variable types

Numeric10
Categorical8
Unsupported1

Alerts

Date has a high cardinality: 577 distinct values High cardinality
Sales is highly correlated with Customers and 1 other fieldsHigh correlation
DayOfWeek is highly correlated with OpenHigh correlation
Customers is highly correlated with Sales and 1 other fieldsHigh correlation
Open is highly correlated with Sales and 2 other fieldsHigh correlation
Sales is highly correlated with Customers and 1 other fieldsHigh correlation
DayOfWeek is highly correlated with OpenHigh correlation
Customers is highly correlated with Sales and 1 other fieldsHigh correlation
Open is highly correlated with Sales and 2 other fieldsHigh correlation
Sales is highly correlated with Customers and 1 other fieldsHigh correlation
Customers is highly correlated with Sales and 1 other fieldsHigh correlation
Open is highly correlated with Sales and 1 other fieldsHigh correlation
Promo2 is highly correlated with PromoIntervalHigh correlation
StoreType is highly correlated with AssortmentHigh correlation
PromoInterval is highly correlated with Promo2High correlation
Assortment is highly correlated with StoreTypeHigh correlation
df_index is highly correlated with StoreHigh correlation
Sales is highly correlated with DayOfWeek and 3 other fieldsHigh correlation
Store is highly correlated with df_indexHigh correlation
StoreType is highly correlated with AssortmentHigh correlation
Assortment is highly correlated with StoreType and 1 other fieldsHigh correlation
CompetitionOpenSinceYear is highly correlated with Promo2SinceWeekHigh correlation
Promo2SinceWeek is highly correlated with CompetitionOpenSinceYear and 2 other fieldsHigh correlation
Promo2SinceYear is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
PromoInterval is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
DayOfWeek is highly correlated with Sales and 1 other fieldsHigh correlation
Customers is highly correlated with Sales and 1 other fieldsHigh correlation
Open is highly correlated with Sales and 1 other fieldsHigh correlation
Promo is highly correlated with SalesHigh correlation
Sales has 14813 (3.0%) missing values Missing
CompetitionOpenSinceMonth has 157257 (31.8%) missing values Missing
CompetitionOpenSinceYear has 157257 (31.8%) missing values Missing
Promo2SinceWeek has 242663 (49.0%) missing values Missing
Promo2SinceYear has 242663 (49.0%) missing values Missing
PromoInterval has 242663 (49.0%) missing values Missing
DayOfWeek has 14838 (3.0%) missing values Missing
Customers has 14900 (3.0%) missing values Missing
Open has 14775 (3.0%) missing values Missing
Promo has 14915 (3.0%) missing values Missing
StateHoliday has 14837 (3.0%) missing values Missing
SchoolHoliday has 15033 (3.0%) missing values Missing
df_index is uniformly distributed Uniform
df_index has unique values Unique
StateHoliday is an unsupported type, check if it needs cleaning or further analysis Unsupported
Sales has 81986 (16.6%) zeros Zeros
Customers has 82004 (16.6%) zeros Zeros

Reproduction

Analysis started2021-11-07 10:16:35.622372
Analysis finished2021-11-07 10:18:18.833957
Duration1 minute and 43.21 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct494778
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean309093.2235
Minimum1
Maximum618471
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 MiB
2021-11-07T11:18:18.909528image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile30851.7
Q1154414.5
median309105.5
Q3463615.75
95-th percentile587360.15
Maximum618471
Range618470
Interquartile range (IQR)309201.25

Descriptive statistics

Standard deviation178489.2561
Coefficient of variation (CV)0.5774609163
Kurtosis-1.199783755
Mean309093.2235
Median Absolute Deviation (MAD)154599
Skewness0.0004499433344
Sum1.529325269 × 1011
Variance3.185841453 × 1010
MonotonicityNot monotonic
2021-11-07T11:18:19.009096image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20491
 
< 0.1%
3204571
 
< 0.1%
4924531
 
< 0.1%
4945001
 
< 0.1%
5047391
 
< 0.1%
5067861
 
< 0.1%
5006411
 
< 0.1%
5026881
 
< 0.1%
4146231
 
< 0.1%
4166701
 
< 0.1%
Other values (494768)494768
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
81
< 0.1%
91
< 0.1%
101
< 0.1%
121
< 0.1%
131
< 0.1%
141
< 0.1%
ValueCountFrequency (%)
6184711
< 0.1%
6184701
< 0.1%
6184691
< 0.1%
6184681
< 0.1%
6184671
< 0.1%
6184661
< 0.1%
6184651
< 0.1%
6184631
< 0.1%
6184621
< 0.1%
6184611
< 0.1%

Sales
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
ZEROS

Distinct19017
Distinct (%)4.0%
Missing14813
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean5666.719767
Minimum0
Maximum38037
Zeros81986
Zeros (%)16.6%
Negative0
Negative (%)0.0%
Memory size3.8 MiB
2021-11-07T11:18:19.112554image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13638
median5626
Q37714
95-th percentile11947
Maximum38037
Range38037
Interquartile range (IQR)4076

Descriptive statistics

Standard deviation3805.148502
Coefficient of variation (CV)0.6714905021
Kurtosis1.914747986
Mean5666.719767
Median Absolute Deviation (MAD)2042
Skewness0.6777930813
Sum2719827153
Variance14479155.12
MonotonicityNot monotonic
2021-11-07T11:18:19.213931image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
081986
 
16.6%
5680102
 
< 0.1%
5483101
 
< 0.1%
6049100
 
< 0.1%
520098
 
< 0.1%
519798
 
< 0.1%
519496
 
< 0.1%
563696
 
< 0.1%
548996
 
< 0.1%
482896
 
< 0.1%
Other values (19007)397096
80.3%
(Missing)14813
 
3.0%
ValueCountFrequency (%)
081986
16.6%
1331
 
< 0.1%
2861
 
< 0.1%
2971
 
< 0.1%
4161
 
< 0.1%
5061
 
< 0.1%
5381
 
< 0.1%
5411
 
< 0.1%
5521
 
< 0.1%
5551
 
< 0.1%
ValueCountFrequency (%)
380371
< 0.1%
374031
< 0.1%
359091
< 0.1%
353501
< 0.1%
349041
< 0.1%
348141
< 0.1%
346922
< 0.1%
344751
< 0.1%
343691
< 0.1%
340011
< 0.1%

Store
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1115
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean557.8484512
Minimum1
Maximum1115
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 MiB
2021-11-07T11:18:19.321778image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile56
Q1279
median558
Q3836
95-th percentile1059
Maximum1115
Range1114
Interquartile range (IQR)557

Descriptive statistics

Standard deviation321.7982629
Coefficient of variation (CV)0.5768560659
Kurtosis-1.19993654
Mean557.8484512
Median Absolute Deviation (MAD)279
Skewness0.0001858995185
Sum276011141
Variance103554.122
MonotonicityNot monotonic
2021-11-07T11:18:19.485516image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
773479
 
0.1%
763478
 
0.1%
898474
 
0.1%
166473
 
0.1%
486472
 
0.1%
219472
 
0.1%
393472
 
0.1%
793471
 
0.1%
374470
 
0.1%
442470
 
0.1%
Other values (1105)490047
99.0%
ValueCountFrequency (%)
1464
0.1%
2456
0.1%
3444
0.1%
4446
0.1%
5463
0.1%
6451
0.1%
7459
0.1%
8441
0.1%
9430
0.1%
10447
0.1%
ValueCountFrequency (%)
1115447
0.1%
1114431
0.1%
1113457
0.1%
1112443
0.1%
1111456
0.1%
1110443
0.1%
1109445
0.1%
1108444
0.1%
1107405
0.1%
1106437
0.1%

StoreType
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.8 MiB
a
267306 
d
153920 
c
65990 
b
 
7562

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa
2nd rowa
3rd rowa
4th rowd
5th rowb

Common Values

ValueCountFrequency (%)
a267306
54.0%
d153920
31.1%
c65990
 
13.3%
b7562
 
1.5%

Length

2021-11-07T11:18:19.569614image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-07T11:18:19.617244image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
a267306
54.0%
d153920
31.1%
c65990
 
13.3%
b7562
 
1.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Assortment
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.8 MiB
a
262625 
c
228131 
b
 
4022

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowc
2nd rowc
3rd rowa
4th rowc
5th rowb

Common Values

ValueCountFrequency (%)
a262625
53.1%
c228131
46.1%
b4022
 
0.8%

Length

2021-11-07T11:18:19.672981image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-07T11:18:19.719257image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
a262625
53.1%
c228131
46.1%
b4022
 
0.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

CompetitionDistance
Real number (ℝ≥0)

Distinct654
Distinct (%)0.1%
Missing1323
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean5409.264553
Minimum20
Maximum75860
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 MiB
2021-11-07T11:18:19.784468image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile140
Q1710
median2320
Q36880
95-th percentile20260
Maximum75860
Range75840
Interquartile range (IQR)6170

Descriptive statistics

Standard deviation7674.370685
Coefficient of variation (CV)1.418745674
Kurtosis13.00397241
Mean5409.264553
Median Absolute Deviation (MAD)1970
Skewness2.924543913
Sum2669228640
Variance58895965.42
MonotonicityNot monotonic
2021-11-07T11:18:19.885830image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2505374
 
1.1%
12003886
 
0.8%
503587
 
0.7%
3503586
 
0.7%
1903572
 
0.7%
3303138
 
0.6%
1803130
 
0.6%
1503092
 
0.6%
903082
 
0.6%
10702725
 
0.6%
Other values (644)458283
92.6%
ValueCountFrequency (%)
20440
 
0.1%
301788
0.4%
402238
0.5%
503587
0.7%
601359
 
0.3%
702260
0.5%
801333
 
0.3%
903082
0.6%
1002279
0.5%
1102636
0.5%
ValueCountFrequency (%)
75860447
0.1%
58260444
0.1%
48330451
0.1%
46590455
0.1%
45740427
0.1%
44320454
0.1%
40860442
0.1%
40540445
0.1%
38710455
0.1%
38630444
0.1%

CompetitionOpenSinceMonth
Real number (ℝ≥0)

MISSING

Distinct12
Distinct (%)< 0.1%
Missing157257
Missing (%)31.8%
Infinite0
Infinite (%)0.0%
Mean7.226048157
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 MiB
2021-11-07T11:18:19.975338image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q14
median8
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.211344325
Coefficient of variation (CV)0.4444122506
Kurtosis-1.243925006
Mean7.226048157
Median Absolute Deviation (MAD)3
Skewness-0.1715032332
Sum2438943
Variance10.31273237
MonotonicityNot monotonic
2021-11-07T11:18:20.041523image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
955551
 
11.2%
441762
 
8.4%
1140962
 
8.3%
331055
 
6.3%
729457
 
6.0%
1228335
 
5.7%
1027053
 
5.5%
622204
 
4.5%
519378
 
3.9%
218266
 
3.7%
Other values (2)23498
 
4.7%
(Missing)157257
31.8%
ValueCountFrequency (%)
16170
 
1.2%
218266
 
3.7%
331055
6.3%
441762
8.4%
519378
 
3.9%
622204
 
4.5%
729457
6.0%
817328
 
3.5%
955551
11.2%
1027053
5.5%
ValueCountFrequency (%)
1228335
5.7%
1140962
8.3%
1027053
5.5%
955551
11.2%
817328
 
3.5%
729457
6.0%
622204
 
4.5%
519378
 
3.9%
441762
8.4%
331055
6.3%

CompetitionOpenSinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct23
Distinct (%)< 0.1%
Missing157257
Missing (%)31.8%
Infinite0
Infinite (%)0.0%
Mean2008.674189
Minimum1900
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 MiB
2021-11-07T11:18:20.112866image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1900
5-th percentile2001
Q12006
median2010
Q32013
95-th percentile2014
Maximum2015
Range115
Interquartile range (IQR)7

Descriptive statistics

Standard deviation6.156380181
Coefficient of variation (CV)0.003064897341
Kurtosis125.8876054
Mean2008.674189
Median Absolute Deviation (MAD)3
Skewness-7.89739704
Sum677969721
Variance37.90101693
MonotonicityNot monotonic
2021-11-07T11:18:20.195148image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
201336852
 
7.4%
201236309
 
7.3%
201431055
 
6.3%
200527533
 
5.6%
201024534
 
5.0%
201124069
 
4.9%
200923971
 
4.8%
200823793
 
4.8%
200721245
 
4.3%
200620896
 
4.2%
Other values (13)67264
13.6%
(Missing)157257
31.8%
ValueCountFrequency (%)
1900427
 
0.1%
1961462
 
0.1%
19902252
 
0.5%
1994882
 
0.2%
1995889
 
0.2%
1998441
 
0.1%
19993548
 
0.7%
20004419
 
0.9%
20017043
1.4%
200211994
2.4%
ValueCountFrequency (%)
201516844
3.4%
201431055
6.3%
201336852
7.4%
201236309
7.3%
201124069
4.9%
201024534
5.0%
200923971
4.8%
200823793
4.8%
200721245
4.3%
200620896
4.2%

Promo2
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.8 MiB
1
252115 
0
242663 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
1252115
51.0%
0242663
49.0%

Length

2021-11-07T11:18:20.274563image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-07T11:18:20.319508image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
1252115
51.0%
0242663
49.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Promo2SinceWeek
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct24
Distinct (%)< 0.1%
Missing242663
Missing (%)49.0%
Infinite0
Infinite (%)0.0%
Mean23.51320627
Minimum1
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 MiB
2021-11-07T11:18:20.367406image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q113
median22
Q337
95-th percentile45
Maximum50
Range49
Interquartile range (IQR)24

Descriptive statistics

Standard deviation14.12903756
Coefficient of variation (CV)0.6008979547
Kurtosis-1.383594275
Mean23.51320627
Median Absolute Deviation (MAD)13
Skewness0.08225705468
Sum5928032
Variance199.6297024
MonotonicityNot monotonic
2021-11-07T11:18:20.449769image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
1435930
 
7.3%
4033099
 
6.7%
3119578
 
4.0%
1018688
 
3.8%
517406
 
3.5%
3715627
 
3.2%
115611
 
3.2%
1314958
 
3.0%
4514936
 
3.0%
2214374
 
2.9%
Other values (14)51908
 
10.5%
(Missing)242663
49.0%
ValueCountFrequency (%)
115611
3.2%
517406
3.5%
6451
 
0.1%
96161
 
1.2%
1018688
3.8%
1314958
3.0%
1435930
7.3%
1812935
 
2.6%
2214374
2.9%
232182
 
0.4%
ValueCountFrequency (%)
50469
 
0.1%
49416
 
0.1%
484052
 
0.8%
4514936
3.0%
441345
 
0.3%
4033099
6.7%
392557
 
0.5%
3715627
3.2%
364436
 
0.9%
3511137
 
2.3%

Promo2SinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing242663
Missing (%)49.0%
Infinite0
Infinite (%)0.0%
Mean2011.758249
Minimum2009
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 MiB
2021-11-07T11:18:20.520658image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2009
Q12011
median2012
Q32013
95-th percentile2014
Maximum2015
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.670213506
Coefficient of variation (CV)0.0008302257524
Kurtosis-1.057709888
Mean2011.758249
Median Absolute Deviation (MAD)1
Skewness-0.1188747565
Sum507194431
Variance2.789613156
MonotonicityNot monotonic
2021-11-07T11:18:20.582244image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
201156648
 
11.4%
201353489
 
10.8%
201441202
 
8.3%
201235876
 
7.3%
200932314
 
6.5%
201028202
 
5.7%
20154384
 
0.9%
(Missing)242663
49.0%
ValueCountFrequency (%)
200932314
6.5%
201028202
5.7%
201156648
11.4%
201235876
7.3%
201353489
10.8%
201441202
8.3%
20154384
 
0.9%
ValueCountFrequency (%)
20154384
 
0.9%
201441202
8.3%
201353489
10.8%
201235876
7.3%
201156648
11.4%
201028202
5.7%
200932314
6.5%

PromoInterval
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing242663
Missing (%)49.0%
Memory size3.8 MiB
Jan,Apr,Jul,Oct
147111 
Feb,May,Aug,Nov
57902 
Mar,Jun,Sept,Dec
47102 

Length

Max length16
Median length15
Mean length15.18682744
Min length15

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFeb,May,Aug,Nov
2nd rowJan,Apr,Jul,Oct
3rd rowFeb,May,Aug,Nov
4th rowFeb,May,Aug,Nov
5th rowFeb,May,Aug,Nov

Common Values

ValueCountFrequency (%)
Jan,Apr,Jul,Oct147111
29.7%
Feb,May,Aug,Nov57902
 
11.7%
Mar,Jun,Sept,Dec47102
 
9.5%
(Missing)242663
49.0%

Length

2021-11-07T11:18:20.660016image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-07T11:18:20.713014image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
jan,apr,jul,oct147111
58.4%
feb,may,aug,nov57902
 
23.0%
mar,jun,sept,dec47102
 
18.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Date
Categorical

HIGH CARDINALITY

Distinct577
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.8 MiB
2013-02-12
 
905
2014-05-09
 
902
2013-01-14
 
901
2013-07-28
 
900
2013-05-13
 
899
Other values (572)
490271 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2014-02-02
2nd row2013-07-19
3rd row2013-11-24
4th row2013-03-01
5th row2013-12-17

Common Values

ValueCountFrequency (%)
2013-02-12905
 
0.2%
2014-05-09902
 
0.2%
2013-01-14901
 
0.2%
2013-07-28900
 
0.2%
2013-05-13899
 
0.2%
2013-09-16899
 
0.2%
2013-06-14898
 
0.2%
2014-03-05897
 
0.2%
2013-08-01896
 
0.2%
2013-01-10896
 
0.2%
Other values (567)485785
98.2%

Length

2021-11-07T11:18:20.771445image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2013-02-12905
 
0.2%
2014-05-09902
 
0.2%
2013-01-14901
 
0.2%
2013-07-28900
 
0.2%
2013-05-13899
 
0.2%
2013-09-16899
 
0.2%
2013-06-14898
 
0.2%
2014-03-05897
 
0.2%
2013-07-12896
 
0.2%
2013-01-10896
 
0.2%
Other values (567)485785
98.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

DayOfWeek
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing14838
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean3.993878401
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 MiB
2021-11-07T11:18:20.913378image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.997798516
Coefficient of variation (CV)0.5002151582
Kurtosis-1.247693749
Mean3.993878401
Median Absolute Deviation (MAD)2
Skewness0.005397276137
Sum1916822
Variance3.991198912
MonotonicityNot monotonic
2021-11-07T11:18:20.978620image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
269077
14.0%
468851
13.9%
368809
13.9%
568433
13.8%
168422
13.8%
668186
13.8%
768162
13.8%
(Missing)14838
 
3.0%
ValueCountFrequency (%)
168422
13.8%
269077
14.0%
368809
13.9%
468851
13.9%
568433
13.8%
668186
13.8%
768162
13.8%
ValueCountFrequency (%)
768162
13.8%
668186
13.8%
568433
13.8%
468851
13.9%
368809
13.9%
269077
14.0%
168422
13.8%

Customers
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
ZEROS

Distinct3745
Distinct (%)0.8%
Missing14900
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean628.8059215
Minimum0
Maximum7388
Zeros82004
Zeros (%)16.6%
Negative0
Negative (%)0.0%
Memory size3.8 MiB
2021-11-07T11:18:21.069168image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1398
median604
Q3833
95-th percentile1362
Maximum7388
Range7388
Interquartile range (IQR)435

Descriptive statistics

Standard deviation463.3450633
Coefficient of variation (CV)0.7368649808
Kurtosis7.113851144
Mean628.8059215
Median Absolute Deviation (MAD)218
Skewness1.597509155
Sum301750128
Variance214688.6476
MonotonicityNot monotonic
2021-11-07T11:18:21.172204image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
082004
 
16.6%
5601147
 
0.2%
5551127
 
0.2%
5171118
 
0.2%
5281106
 
0.2%
5821104
 
0.2%
6251103
 
0.2%
5761100
 
0.2%
6031097
 
0.2%
5711095
 
0.2%
Other values (3735)387877
78.4%
(Missing)14900
 
3.0%
ValueCountFrequency (%)
082004
16.6%
181
 
< 0.1%
361
 
< 0.1%
401
 
< 0.1%
501
 
< 0.1%
601
 
< 0.1%
611
 
< 0.1%
641
 
< 0.1%
681
 
< 0.1%
741
 
< 0.1%
ValueCountFrequency (%)
73881
< 0.1%
53871
< 0.1%
52971
< 0.1%
51121
< 0.1%
51061
< 0.1%
50631
< 0.1%
50341
< 0.1%
50281
< 0.1%
50141
< 0.1%
50131
< 0.1%

Open
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing14775
Missing (%)3.0%
Memory size3.8 MiB
1.0
398060 
0.0
81943 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row1.0
3rd row0.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0398060
80.5%
0.081943
 
16.6%
(Missing)14775
 
3.0%

Length

2021-11-07T11:18:21.263344image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-07T11:18:21.309940image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0398060
82.9%
0.081943
 
17.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Promo
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing14915
Missing (%)3.0%
Memory size3.8 MiB
0.0
302306 
1.0
177557 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row1.0
3rd row0.0
4th row0.0
5th row1.0

Common Values

ValueCountFrequency (%)
0.0302306
61.1%
1.0177557
35.9%
(Missing)14915
 
3.0%

Length

2021-11-07T11:18:21.358860image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-07T11:18:21.404926image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0302306
63.0%
1.0177557
37.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

StateHoliday
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing14837
Missing (%)3.0%
Memory size3.8 MiB

SchoolHoliday
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing15033
Missing (%)3.0%
Memory size3.8 MiB
0.0
396594 
1.0
83151 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0396594
80.2%
1.083151
 
16.8%
(Missing)15033
 
3.0%

Length

2021-11-07T11:18:21.453747image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-07T11:18:21.499716image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0396594
82.7%
1.083151
 
17.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2021-11-07T11:18:08.974098image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:16:54.546939image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:01.365411image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:08.266241image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:36.648372image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:43.709225image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:48.916625image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:54.063932image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:58.064269image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:02.102679image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:09.110573image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:16:54.690888image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:01.498829image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:10.934753image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:36.789810image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:43.829284image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:49.038882image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:54.169856image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:58.180474image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:02.233488image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:09.321410image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:16:54.905218image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:01.708489image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:13.689984image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:37.013733image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:44.089426image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:49.224161image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:54.321104image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:58.337637image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:02.440342image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:15.127691image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:00.488454image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:07.260280image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:21.683922image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:42.816832image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:48.057843image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:53.260095image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:57.242771image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:01.267324image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:08.112275image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:15.248227image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:00.612632image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:07.381228image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:24.338389image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:42.956420image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:48.178586image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:53.381894image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:57.419333image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:01.432162image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:08.234499image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:15.443172image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:00.731744image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:07.499670image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:26.251398image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:43.077331image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:48.294876image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:53.501601image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:57.519842image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:01.532436image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:08.351209image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:15.558352image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:00.844387image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:07.610934image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:28.189615image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:43.191122image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:48.393356image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:53.606420image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:57.627742image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:01.643887image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:08.462687image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:15.664298image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:00.953056image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:07.717546image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:29.643737image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:43.297790image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:48.559340image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:53.707943image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:57.735803image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:01.750380image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:08.574190image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:15.804074image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:01.090584image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:07.850689image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:31.098307image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:43.437284image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:48.676299image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:53.831587image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:57.856952image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:01.859742image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:08.706838image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:15.941529image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:01.227158image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:07.982830image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:33.893403image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:43.579277image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:48.794482image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:53.951812image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:17:57.962565image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:01.968586image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-11-07T11:18:08.837156image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-11-07T11:18:21.557809image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-11-07T11:18:21.693713image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-11-07T11:18:21.830006image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-11-07T11:18:21.959231image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-11-07T11:18:22.065867image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-11-07T11:18:16.195222image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-11-07T11:18:16.820499image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-11-07T11:18:18.199283image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-11-07T11:18:18.479645image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexSalesStoreStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoIntervalDateDayOfWeekCustomersOpenPromoStateHolidaySchoolHoliday
02904820.0524ac40860.09.02013.00NaNNaNNaN2014-02-027.00.00.00.000.0
11399156267.0253ac250.0NaNNaN15.02013.0Feb,May,Aug,Nov2013-07-195.0749.01.01.000.0
23186570.0575aa960.05.02008.0113.02010.0Jan,Apr,Jul,Oct2013-11-247.00.00.00.000.0
33616175900.0653dc7520.07.02014.0145.02009.0Feb,May,Aug,Nov2013-03-01NaN548.01.00.000.0
43746468468.0676bb1410.09.02008.00NaNNaNNaN2013-12-172.01767.01.01.000.0
51665011.01ca1270.09.02008.00NaNNaNNaN2013-06-204.0539.01.01.000.0
61945445437.0351aa5290.011.02012.015.02013.0Feb,May,Aug,Nov2014-06-076.0493.01.00.000.0
75035088226.0909ac1680.0NaNNaN145.02009.0Feb,May,Aug,Nov2013-01-045.0876.01.00.001.0
8450198085.082aa22390.04.02008.0137.02009.0Jan,Apr,Jul,Oct2013-03-155.0831.01.00.000.0
9296510.054dc7170.08.02014.015.02013.0Feb,May,Aug,Nov2013-10-067.00.00.00.000.0

Last rows

df_indexSalesStoreStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoIntervalDateDayOfWeekCustomersOpenPromoStateHolidaySchoolHoliday
4947681752035078.0317da3140.07.02013.0114.02011.0Jan,Apr,Jul,Oct2013-04-055.0604.01.00.001.0
494769874988277.0158dc11840.0NaNNaN131.02009.0Feb,May,Aug,Nov2014-06-102.0522.01.00.000.0
4947705214306063.0941aa1200.012.02011.0131.02013.0Jan,Apr,Jul,Oct2013-06-182.0651.01.01.000.0
4947711373375581.0248ac340.09.02012.0140.02012.0Jan,Apr,Jul,Oct2014-03-112.0945.01.00.00.00.0
494772548864860.099cc2030.011.02003.0122.02012.0Mar,Jun,Sept,Dec2014-04-26NaN464.01.00.000.0
4947731102680.0200aa1650.010.02000.00NaNNaNNaN2013-05-127.00.00.00.000.0
4947742591784860.0468cc5260.09.02012.00NaNNaNNaN2013-02-111.0603.01.00.000.0
4947753658385691.0660aa1200.011.02006.0140.02014.0Jan,Apr,Jul,Oct2014-01-105.0618.01.01.000.0
4947761319324180.0239dc610.0NaNNaN0NaNNaNNaN2013-02-236.0432.01.00.000.0
4947771219585931.0221dc13530.09.02013.00NaNNaNNaN2013-06-06NaN590.01.01.000.0